Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to new package index, decompress into .tar in javascript #2024

Merged
merged 4 commits into from
May 7, 2024

Conversation

garrettgu10
Copy link
Collaborator

@garrettgu10 garrettgu10 commented Apr 15, 2024

how to test: bazel build //... && time ./bazel-bin/src/workerd/server/workerd test samples/pyodide-langchain/config.capnp | ts
Without this change: 27s loading packages, 155s CPU time, 63s total wall time
With this change: 1s loading packages, 100s CPU time, 40s total wall time

@garrettgu10 garrettgu10 force-pushed the ggu/tar-gz-wheels branch 2 times, most recently from 0ffbb12 to 653acab Compare April 16, 2024 17:09
@garrettgu10 garrettgu10 marked this pull request as ready for review April 16, 2024 21:49
@garrettgu10 garrettgu10 requested review from a team as code owners April 16, 2024 21:49
@garrettgu10 garrettgu10 requested a review from dom96 April 17, 2024 15:46
@garrettgu10 garrettgu10 requested a review from a team as a code owner April 17, 2024 18:25
Copy link
Collaborator

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments, also can you describe how you've tested this in the PR description for future reference?

Some changes needed but accepting to unblock.

Comment on lines +40 to +58
class ArrayBufferReader {
constructor(arrayBuffer) {
this.arrayBuffer = arrayBuffer;
}

read(offset, buf){
// buf is a Uint8Array
const size = this.arrayBuffer.byteLength;
if (offset >= size || offset < 0) {
return 0;
}
let toCopy = buf.length;
if (size - offset < toCopy) {
toCopy = size - offset;
}
buf.set(new Uint8Array(this.arrayBuffer, offset, toCopy));
return toCopy;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should tar.js instead be changed to expect an ArrayBuffer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, then we would have to expose an ArrayBuffer instead of a PackagesTarReader, not sure if we want that big of a refactor. This interface is meant to match exactly what PackagesTarReader does in JSG.

loading.push(req);
}

console.log("Loading " + loading.join(", "));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these console.logs be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept them there to match the original prints from the pyodide package loader. I'd prefer to keep them for now so we have more visibility on performance in testing.


console.log("Loading " + loading.join(", "));

await Promise.all(loadPromises).then((buffers) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
await Promise.all(loadPromises).then((buffers) => {
const buffers = await Promise.all(loadPromises);

@@ -194,6 +195,7 @@ async function prepareWasmLinearMemory(Module) {
}

export async function loadPyodide(lockfile, indexURL) {
console.log("loading pyodide");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log("loading pyodide");

}
});

console.log("Loaded " + loading.join(", "));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log("Loaded " + loading.join(", "));

loading.push(req);
}

console.log("Loading " + loading.join(", "));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
console.log("Loading " + loading.join(", "));


/**
* A small bundle contains just a single package. The entire bundle will be overlaid onto site-packages.
* A small bundle can basically be thought of as a wheel.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it just be called addSingleWheel then?

* @param {List<String>} soFiles A list of .so files contained in the big bundle
* @param {List<String>} requirements canonicalized list of packages to pick from the big bundle
*/
addBigBundle(tarInfo, soFiles, requirements) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we have this split. Isn't addSmallBundle just the same as addBigBundle with a single requirement? Why separate them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure of a big bundle is different from a small bundle. A big bundle contains one folder for each package, with each folder containing the structure of a wheel. A small bundle contains just the wheel for a single package, not contained within a folder.


let LOAD_WHEELS_FROM_R2 = true;
let requirementsInBigBundle = new Set([...STDLIB_PACKAGES]);
if(bigTarInfo.children.size > 10) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if(bigTarInfo.children.size > 10) {
if (bigTarInfo.children.size > 10) {

}

function disabledLoadPackage() {
throw new Error("We only use loadPackage in workerd");
throw new Error("pyodide.loadPackage is disabled");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new Error("pyodide.loadPackage is disabled");
throw new Error("pyodide.loadPackage is disabled because packages are embedded in the binary");

@garrettgu10 garrettgu10 merged commit 9bb3e21 into main May 7, 2024
10 checks passed
garrettgu10 added a commit that referenced this pull request May 13, 2024
* switch to new package index, decompress into .tar in javascript

* switch to new index, perform full package installation in JavaScript

* Address nits

* Fix circular import issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants